61 research outputs found
Community detection based on links and node features in social networks
Š Springer International Publishing Switzerland 2015. Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms
A Knowledge-Based Semantic Kernel for Text Classification
Abstract. Typically, in textual document classification the documents are represented in the vector space using the âBag of Words â (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic relatedness measure for pairs of words, namely Omiotis, into a seman-tic kernel. The suggested measure incorporates the TF-IDF weighting scheme, thus creating a semantic kernel which combines both seman-tic and statistical information from text. Empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the standard BOW representation, when Omiotis is embedded in four different classifiers
Kernel Spectral Clustering and applications
In this chapter we review the main literature related to kernel spectral
clustering (KSC), an approach to clustering cast within a kernel-based
optimization setting. KSC represents a least-squares support vector machine
based formulation of spectral clustering described by a weighted kernel PCA
objective. Just as in the classifier case, the binary clustering model is
expressed by a hyperplane in a high dimensional space induced by a kernel. In
addition, the multi-way clustering can be obtained by combining a set of binary
decision functions via an Error Correcting Output Codes (ECOC) encoding scheme.
Because of its model-based nature, the KSC method encompasses three main steps:
training, validation, testing. In the validation stage model selection is
performed to obtain tuning parameters, like the number of clusters present in
the data. This is a major advantage compared to classical spectral clustering
where the determination of the clustering parameters is unclear and relies on
heuristics. Once a KSC model is trained on a small subset of the entire data,
it is able to generalize well to unseen test points. Beyond the basic
formulation, sparse KSC algorithms based on the Incomplete Cholesky
Decomposition (ICD) and , , Group Lasso regularization are
reviewed. In that respect, we show how it is possible to handle large scale
data. Also, two possible ways to perform hierarchical clustering and a soft
clustering method are presented. Finally, real-world applications such as image
segmentation, power load time-series clustering, document clustering and big
data learning are considered.Comment: chapter contribution to the book "Unsupervised Learning Algorithms
Quantitative Concept Analysis
Formal Concept Analysis (FCA) begins from a context, given as a binary
relation between some objects and some attributes, and derives a lattice of
concepts, where each concept is given as a set of objects and a set of
attributes, such that the first set consists of all objects that satisfy all
attributes in the second, and vice versa. Many applications, though, provide
contexts with quantitative information, telling not just whether an object
satisfies an attribute, but also quantifying this satisfaction. Contexts in
this form arise as rating matrices in recommender systems, as occurrence
matrices in text analysis, as pixel intensity matrices in digital image
processing, etc. Such applications have attracted a lot of attention, and
several numeric extensions of FCA have been proposed. We propose the framework
of proximity sets (proxets), which subsume partially ordered sets (posets) as
well as metric spaces. One feature of this approach is that it extracts from
quantified contexts quantified concepts, and thus allows full use of the
available information. Another feature is that the categorical approach allows
analyzing any universal properties that the classical FCA and the new versions
may have, and thus provides structural guidance for aligning and combining the
approaches.Comment: 16 pages, 3 figures, ICFCA 201
Recommended from our members
Parallel computing in information retrieval - An updated review
The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for Text Retrieval. We analyse parallel IR systems using a classification due to Rasmussen [1] and describe some parallel IR systems. We give a description of the retrieval models used in parallel Information Processing.. We describe areas of research which we believe are needed
A Linear-Algebraic Technique with an Application in Semantic Image Retrieval
This paper presents a novel technique for learning the underlying structure that links visual observations with semantics. The technique, inspired by a text-retrieval technique known as cross-language latent semantic indexing uses linear algebra to learn the semantic structure linking image features and keywords from a training set of annotated images. This structure can then be applied to unannotated images, thus providing the ability to search the unannotated images based on keyword. This factorisation approach is shown to perform well, even when using only simple global image features
- âŚ